csvread | R Documentation |
Fast specialized CSV file loader, as well as an implementation of a basic 64-bit integer class.
csvread
provides functionality for loading large (10M+ lines) CSV
and other delimited files, similar to read.csv
, but typically faster and
using less memory than the standard R loader. While not entirely general,
it covers many common use cases when the types of columns in the CSV file
are known in advance. In addition, the package provides a class int64
,
which represents 64-bit integers exactly when reading from a file. The
latter is useful when working with 64-bit integer identifiers exported from
databases. The CSV file loader supports common column types including
integer
, double
, string
,
and int64
, leaving further type transformations to the user.
The code was tested on a Linux server using a CSV file with 10 million rows and 94 columns, mostly numeric. The size of the raw file was 3.5GB. The file was read from local storage. R version was 3.0.1.
The timing of the read.csv
command was 672 seconds with peak memory usage of
16GB and final memory usage of 14.2GB, which fell to 5.8GB after a call to gc()
.
system.time(df.r <- read.csv("benchmark10M.csv", stringsAsFactors = FALSE, header = FALSE, sep = ",")) user system elapsed 649.573 21.231 672.058 > dim(df.r) [1] 10000000 94
The timing of the csvread
function was 62 seconds (elapsed) with the peak and final
memory usage of 4.7GB.
Copyright (C) Collective, Inc. with portions Copyright (C) Jabiru Ventures LLC
Apache License, Version 2.0, available at http://www.apache.org/licenses/LICENSE-2.0
http://github.com/jabiru/csvread
install.packages("csvread")
library(devtools); devtools::install_github("jabiru/csvread")
Sergei Izrailev, please contact at email stored at http://scr.im/izrg